bilinear attention network
- Asia > South Korea > Seoul > Seoul (0.05)
- North America > Canada > Quebec > Montreal (0.04)
Bilinear Attention Networks
Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively. However, the computational cost to learn attention distributions for every pair of multimodal input channels is prohibitively expensive. To solve this problem, co-attention builds two separate attention distributions for each modality neglecting the interaction between multimodal inputs. In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly. BAN considers bilinear interactions among two groups of input channels, while low-rank bilinear pooling extracts the joint representations for each pair of channels. Furthermore, we propose a variant of multimodal residual networks to exploit eight-attention maps of the BAN efficiently. We quantitatively and qualitatively evaluate our model on visual question answering (VQA 2.0) and Flickr30k Entities datasets, showing that BAN significantly outperforms previous methods and achieves new state-of-the-arts on both datasets.
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering
Zhang, Zhilin, Wang, Jie, Zhu, Ruiqi, Gong, Xiaoliang
Medical Visual Question Answering (MedVQA) has gained increasing attention at the intersection of computer vision and natural language processing. Its capability to interpret radiological images and deliver precise answers to clinical inquiries positions MedVQA as a valuable tool for supporting diagnostic decision-making for physicians and alleviating the workload on radiologists. While recent approaches focus on using unified pre-trained large models for multi-modal fusion like cross-modal Transformers, research on more efficient fusion methods remains relatively scarce within this discipline. In this paper, we introduce a novel fusion model that integrates Orthogonality loss, Multi-head attention and Bilinear Attention Network (OMniBAN) to achieve high computational efficiency and strong performance without the need for pre-training. We conduct comprehensive experiments and clarify aspects of how to enhance bilinear attention fusion to achieve performance comparable to that of large models. Experimental results show that OMniBAN outperforms traditional models on key MedVQA benchmarks while maintaining a lower computational cost, which indicates its potential for efficient clinical application in radiology and pathology image question answering.
- North America > United States > New York (0.04)
- Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.68)
AI could speed up discovery of new medicines
Artificial intelligence that could reduce the cost and speed-up the discovery of new medicines has been developed as part of a collaboration between researchers at the University of Sheffield and AstraZeneca. The new technology, developed by Professor Haiping Lu and his Ph.D. student Peizhen Bai from Sheffield's Department of Computer Science, with Dr. Filip Miljković and Dr. Bino John from AstraZeneca, is described in a new study published in Nature Machine Intelligence. The study demonstrates that the AI, called DrugBAN, can predict whether a candidate drug will interact with its intended target protein molecules inside the human body. AI that can predict whether drugs will reach their intended targets already exists, but the technology developed by the researchers at Sheffield and AstraZeneca can do this with greater accuracy and also provide useful insights to help scientists understand how drugs engage with their protein partners at a molecular level, according to the paper published on February 2, 2023. AI has the potential to inform whether a drug will successfully engage an intended cancer-related protein, or whether a candidate drug will bind to unintended targets in the body and lead to undesirable side-effects for patients.
Bilinear Attention Networks
Kim, Jin-Hwa, Jun, Jaehyun, Zhang, Byoung-Tak
Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively. However, the computational cost to learn attention distributions for every pair of multimodal input channels is prohibitively expensive. To solve this problem, co-attention builds two separate attention distributions for each modality neglecting the interaction between multimodal inputs. In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly. BAN considers bilinear interactions among two groups of input channels, while low-rank bilinear pooling extracts the joint representations for each pair of channels.